The analysis below applies the SOGP framework to investigate Covid-19 excess deaths. To do so we utilize the recently published short-term mortality fluctuation (STMF) dataset published by the Human Mortality Database. The STMF was expressly created to accomodate the critical need of short-term mortality analyses in response to the Covid-19 pandamic. The STMF contains detailed country-specific mortality, segregated by week (52 weeks per year), sex, and 3 age groups in 15 countries:

Austria (AUT)
Belgium (BEL)
Bulgaria (BGR)
Czech Republic (CZE)
Germany (DEUTNP)
Denmark (DNK)
Spain (ESP)
Finland (FIN)

England and Wales (GBRTENW)
Iceland (ISL)
Netherlands (NLD)
Norway (NOR)
Portugal (PRT)
Sweden (SWE)
United States (USA)

The dataset is frequently updated. The analysis below is based on datasets last updated on May 26, 2020:

deaths <- readr::read_csv("https://www.mortality.org/Public/STMF/Outputs/stmf.csv", skip=1)

For our analysis we work with mortality rates (provided in STMF as the RTotal field) transformed to the log scale. We furthermore apply a few adjustments:

deaths <- data.table(deaths)
deaths <- deaths[Sex=="b",.(Year,Week,RTotal,DTotal,CountryCode)] 
deaths <- deaths[,y:=log(RTotal)]

\(\color{#c64329}{\textbf{Gaussian Process Regression in estimating 2020 excess deaths}}\)

Our primary goal is to estimate excess mortality during 2020. This is defined as the difference between the observed and the expected number of deaths. Epidemiologists use excess mortality as a standard tool to understand the full impact of the pandemic. It does not only reflect deaths dirctly due to Covid-19 but also other causes associated with the pandemic, such as delayed or forgone care due to hospitals prioritizing Covid-19 emergencies or patients themselves staying away.

To define excess mortality requires agreeing on the baseline, i.e. the expected number of deaths if Covid-19 did not occur. A very simple method is to look at historical weekly mortality and then average. For example, one common metric is the median mortality over the previous 5 years, 2015-2019. However, this approach ignores trends in the data (specifically the mortality improvement factors whereby mortality gets better YoY), is sensitive to data outliers, and most importantly does not provide good uncertainty quantification. UQ is critical to answer the question of whether observed death counts are statistically significant or can be attributed to “random” fluctuations. Mortality data is intrinsically volatile and it is not straightforward to distinguish between frequent fluctuations that could happen versus an unambiguous Covid-19 spike.

Our analysis uses statistical learning, namely GP models to answer the above. To do so, we build a GP model on the historical data, taking for our training set Years 2010-2019, as well as Weeks 1-5 of 2020. We then use this training set to make probabilistic forecasts for Weeks 6-present of 2020 and compare this forecast to the observed mortality. The vertical blue line in the figure below delineates our training set from the forecast period.

The Figure below displays the raw weekly log-mortality rates (after adjustments listed above) for each country indexed by weeks of the year. The 2020 observed log-mortality series is highlighted in red.

To apply the Gaussian process modeling, we treat data as a 2-dimensional table, indexed by Weeks (1-52, coordinate 1) and Years (2010-2020, coordinate 2):

Our goal is to use a Gaussian Process model to:

  1. smooth log-mortality rates in Year and Week dimensions;

  2. perform out-of-sample predictions in 2020 starting from Week 5 to Week 22 (end of May 2020).

Note that GP is effectively a spatial model and does not have any time-series concepts embedded; it treats retrospective and future forecasts identically.

\(\color{#c64329}{\textbf{Sample Gaussian Process Regression for Sweden all-Age Mortality}}\) Below is an illustration on the application of Gaussian Process to analyze and forecast time-series data in Sweden.

 

\(\color{#c64329}{\textbf{Excess mortality}}\)

We can compare the GP-based mean forecast for 2020 against the observed weekly log-mortality rates to identify excess mortality. In the Figure below, the shaded region is the excess mortality starting at Week 11 (early March). Some countries have experienced much more severe impact than others. Observe that 2020 data ends at different weeks in different nations. We also note that in most countries there is a strong seasonal effect whereby mortality is expected to decrease as we enter the Summer months.

 

\(\color{#c64329}{\textbf{Credible Bands}}\)

To highlight the uncertainty quantification available with a GP model, we plot below the credible intervals around our GP-based 2020 forecasts. Below, we plot the mean forecasts for each country along with the 80% and 95% credible bands. We also show the 5-year medians and the observed 2020 rate for comparison purpose.

 

 

 

We note that in several countries, there is no statistically significant deviation from expected mortality. In other cases, such as Spain and USA, the Covid-19 spikes are clear.

A few remarks can be made relative to the historical 5-year median:

\(\color{#c64329}{\textbf{Excess deaths}}\)

Country Total Excess Deaths in 2020 Last Week Reported
Austria 240 18
Belarus 7654 19
Bulgaria -1855 19
Czech Republic -119 14
Germany -2076 17
Denmark 68 20
Spain 25578 20
Finland 7 18
England & Wales 49544 20
Netherlands 8227 19
Norway 117 17
Portugal 267 18
Sweden 3818 19
USA 50585 17

\(\color{#c64329}{\textbf{GP Diagnostics}}\)

For completeness, we report below the fitted GP mean functions for each country, as well as the GP hyperparameters. We note that our basic model had difficulty finding appropriate lengthscales for E&W and was fitted manually for now.

Intercept Year trend A_1 A_2 A_3 A_4
AUT -6.002 0.0007 0.0785 0.0531 0.0036 0.0306
BEL -1.505 -0.0016 0.0917 0.0720 0.0095 0.0261
BGR -17.942 0.0068 0.0936 0.0650 0.0275 0.0153
CZE -12.067 0.0037 0.0643 0.0478 -0.0024 0.0273
DEUTNP -16.110 0.0058 0.0786 0.0645 -0.0067 0.0334
DNK 3.340 -0.0040 0.0677 0.0569 0.0047 0.0194
ESP -25.824 0.0105 0.1033 0.0809 0.0348 0.0420
FIN -13.864 0.0046 0.0636 0.0436 -0.0009 0.0116
GBRTENW -13.563 0.0044 0.1077 0.0152 0.0161 -0.0019
NLD -23.017 0.0091 0.0753 0.0570 0.0063 0.0169
NOR 20.756 -0.0127 0.0857 0.0435 0.0117 0.0193
PRT -28.555 0.0119 0.1410 0.0911 0.0403 0.0504
SWE 23.271 -0.0139 0.0887 0.0569 0.0004 0.0229
USA -25.085 0.0101 0.0719 0.0364 0.0149 0.0059

Hyper-parameters in the covariance function for each country:

Year Lengthscale Week Lengthscale Process variance Observ. Noise
AUT 0.0663 3.877 0.0019 0.0015
BEL 0.0651 1.979 0.0021 0.0012
BGR 0.0368 1.662 0.0027 0.0011
CZE 0.0124 3.197 0.0018 0.0013
DEUTNP 0.2715 1.858 0.0023 0.0005
DNK 0.0061 2.855 0.0012 0.0011
ESP 0.0000 2.007 0.0040 0.0008
FIN 0.3633 1.705 0.0013 0.0011
GBRTENW 0.1092 2.815 0.0031 0.0036
NLD 0.0759 3.511 0.0015 0.0007
NOR 0.0000 4.550 0.0008 0.0017
PRT 0.0737 1.809 0.0030 0.0014
SWE 0.0534 2.568 0.0011 0.0009
USA 0.3949 5.022 0.0009 0.0001